Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ad embedded expression tests to expect_ad testing framework #2837

Open
wants to merge 78 commits into
base: develop
Choose a base branch
from

Conversation

SteveBronder
Copy link
Collaborator

Summary

This adds a simple version of the expression testing framework directly into the call to expect_ad(). This helps fix #2736 by making sure any function that goes through the expression tests will also go check that it is safe to pass Eigen expressions to.

The files for all of this are in test/unit/math/expr_tests.hpp with the main function being check_expr_test(...). This function takes in any set of arguments and for each type that is an Eigen matrix or vector it checks if the function is 'safe' to pass expressions to. The word 'safe' in this context means that, if an unary expression was passed as an argument to the function, is it only called as many times as there are individual elements in the underlying Eigen object? The original Eigen object is wrapped in an unaryFunction that calls the counterOp functor from the original expression testing framework.

template <typename Scal>
struct counterOp {
  int* counter_;
  counterOp(int* counter) { counter_ = counter; }
  const Scal& operator()(const Scal& a) const {
    (*counter_)++;
    return a;
  }
};
// In the code this is used on a matrix x like 
int call_counter;
auto x_op = x.unaryExpr(counterOp(&call_counter));
double b = x_op[0];
double c = x_op[0];
double d = x_op[0];
std::cout << call_counter << std::endl;
// prints "3"

Tests

Tests can be run with

./runTests.py test/unit/math/expr_tests_test.cpp

The tests include examples for one, two, and three inputs to functions. They check that for each combination of inputs we fail the correct number of Eigen objects passed to each function. Each set of inputs also has an associated test that just runs a 'good' funtion which does not fail the tests.

Side Effects

This will increase compile times a bit

Release notes

Adds a simple expression tests framework to the autodiff test framework

Checklist

  • Math issue Document Stan Math expressions requirements  #2736

  • Copyright holder: Steve Bronder

    The copyright holder is typically you or your assignee, such as a university or company. By submitting this pull request, the copyright holder is agreeing to the license the submitted work under the following licenses:
    - Code: BSD 3-clause (https://opensource.org/licenses/BSD-3-Clause)
    - Documentation: CC-BY 4.0 (https://creativecommons.org/licenses/by/4.0/)

  • the basic tests are passing

    • unit tests pass (to run, use: ./runTests.py test/unit)
    • header checks pass, (make test-headers)
    • dependencies checks pass, (make test-math-dependencies)
    • docs build, (make doxygen)
    • code passes the built in C++ standards checks (make cpplint)
  • the code is written in idiomatic C++ and changes are documented in the doxygen

  • the new changes are tested

SteveBronder and others added 30 commits October 25, 2022 11:22
…ev/math into feature/inbedded-expression-tests
@stan-buildbot
Copy link
Contributor


Name Old Result New Result Ratio Performance change( 1 - new / old )
arma/arma.stan 0.22 0.24 0.93 -7.27% slower
low_dim_corr_gauss/low_dim_corr_gauss.stan 0.01 0.01 1.01 0.55% faster
gp_regr/gen_gp_data.stan 0.02 0.02 1.0 -0.09% slower
gp_regr/gp_regr.stan 0.12 0.11 1.05 4.69% faster
sir/sir.stan 84.06 81.71 1.03 2.8% faster
irt_2pl/irt_2pl.stan 4.62 4.48 1.03 3.13% faster
eight_schools/eight_schools.stan 0.06 0.05 1.12 10.39% faster
pkpd/sim_one_comp_mm_elim_abs.stan 0.27 0.27 1.0 0.24% faster
pkpd/one_comp_mm_elim_abs.stan 19.4 19.83 0.98 -2.23% slower
garch/garch.stan 0.6 0.5 1.18 15.14% faster
low_dim_gauss_mix/low_dim_gauss_mix.stan 3.0 3.0 1.0 0.27% faster
arK/arK.stan 1.76 1.72 1.03 2.58% faster
gp_pois_regr/gp_pois_regr.stan 2.78 2.57 1.08 7.49% faster
low_dim_gauss_mix_collapse/low_dim_gauss_mix_collapse.stan 9.81 9.78 1.0 0.26% faster
performance.compilation 198.03 199.15 0.99 -0.57% slower
Mean result: 1.0286084667870055

Jenkins Console Log
Blue Ocean
Commit hash: 14dfa9321f82f4102f9b572c68f1e7440177abf4


Machine information No LSB modules are available. Distributor ID: Ubuntu Description: Ubuntu 20.04.3 LTS Release: 20.04 Codename: focal

CPU:
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
Address sizes: 46 bits physical, 48 bits virtual
CPU(s): 80
On-line CPU(s) list: 0-79
Thread(s) per core: 2
Core(s) per socket: 20
Socket(s): 2
NUMA node(s): 2
Vendor ID: GenuineIntel
CPU family: 6
Model: 85
Model name: Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz
Stepping: 4
CPU MHz: 3345.083
CPU max MHz: 3700.0000
CPU min MHz: 1000.0000
BogoMIPS: 4800.00
Virtualization: VT-x
L1d cache: 1.3 MiB
L1i cache: 1.3 MiB
L2 cache: 40 MiB
L3 cache: 55 MiB
NUMA node0 CPU(s): 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38,40,42,44,46,48,50,52,54,56,58,60,62,64,66,68,70,72,74,76,78
NUMA node1 CPU(s): 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39,41,43,45,47,49,51,53,55,57,59,61,63,65,67,69,71,73,75,77,79
Vulnerability Gather data sampling: Mitigation; Microcode
Vulnerability Itlb multihit: KVM: Mitigation: VMX disabled
Vulnerability L1tf: Mitigation; PTE Inversion; VMX conditional cache flushes, SMT vulnerable
Vulnerability Mds: Mitigation; Clear CPU buffers; SMT vulnerable
Vulnerability Meltdown: Mitigation; PTI
Vulnerability Mmio stale data: Mitigation; Clear CPU buffers; SMT vulnerable
Vulnerability Retbleed: Mitigation; IBRS
Vulnerability Spec rstack overflow: Not affected
Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2: Mitigation; IBRS, IBPB conditional, STIBP conditional, RSB filling, PBRSB-eIBRS Not affected
Vulnerability Srbds: Not affected
Vulnerability Tsx async abort: Mitigation; Clear CPU buffers; SMT vulnerable
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb cat_l3 cdp_l3 invpcid_single pti intel_ppin ssbd mba ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm cqm mpx rdt_a avx512f avx512dq rdseed adx smap clflushopt clwb intel_pt avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm ida arat pln pts hwp hwp_act_window hwp_epp hwp_pkg_req pku ospke md_clear flush_l1d arch_capabilities

G++:
g++ (Ubuntu 9.4.0-1ubuntu1~20.04) 9.4.0
Copyright (C) 2019 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

Clang:
clang version 10.0.0-4ubuntu1
Target: x86_64-pc-linux-gnu
Thread model: posix
InstalledDir: /usr/bin

@WardBrian
Copy link
Member

a concern: this seems to nearly double the time it takes to run the mix unit tests:

A random run from #2983 took 1h31m to run the mix tests: https://jenkins.flatironinstitute.org/blue/organizations/jenkins/Stan%2FMath/detail/PR-2983/4/pipeline/209
The latest run here took 2h51m: https://jenkins.flatironinstitute.org/blue/organizations/jenkins/Stan%2FMath/detail/PR-2837/9/pipeline/209

That is already the longest part of that parallel stage, so the added time does directly impact the overall runtime of the pipeline

@SteveBronder
Copy link
Collaborator Author

I agree it takes more time, but so many countless times we've had users add functions here in Stan math that failed upstream in stanc3 that I think it's worth having these. Also as you can see in the PR there we places in Stan math that actually had small bugs we missed!

@WardBrian
Copy link
Member

I think there are alternative ways of avoiding the issue only cropping up in stanc3.

The simplest I can think of is just having the expression tests (which only take ~25 minutes currently) pull not only from the stanc --dump-stan-math-signatures but also from a file like test/expressions/new_signatures.txt that users could add to and gets cleaned up whenever a release is cut. This would add at most seconds to the CI time, but require one more check box in the PR template.

If the automated tests here are strictly better, we could also lessen the time/resource impact by removing the previous stage for expression testing?

@SteveBronder
Copy link
Collaborator Author

The simplest I can think of is just having the expression tests (which only take ~25 minutes currently) pull not only from the stanc --dump-stan-math-signatures but also from a file like test/expressions/new_signatures.txt that users could add to and gets cleaned up whenever a release is cut. This would add at most seconds to the CI time, but require one more check box in the PR template.

I don't like this. Our PR process is already pretty bespoke and I think this can suffer a lot from user to keyboard error. The full test suite takes like 12 hours to run so honestly an extra 2 hours is actually kind of fine imo. Like it doesn't change the time I will wait to come back and look if my PR is done.

If we wanted to we could have flags that look at what folders have code changes and only run the expression tests with double, var, or fvar, types depending on that.

If the automated tests here are strictly better, we could also lessen the time/resource impact by removing the previous stage for expression testing?

Yes! this covers a lot more of the library, though it does not cover the distributions. But I can modify the distribution code to use the expression test framework as well.

@WardBrian
Copy link
Member

The full test suite takes like 12 hours to run so honestly an extra 2 hours is actually kind of fine imo

Yeah, I just wish if we were changing this time in increments of 2 hours it would be going the other way. I think a shorter feedback loop would be dramatically beneficial for contributors, even if it was still the kind of thing where you only got 2 tries in a work day (versus the current ~1 try).

If we're willing to just throw up our hands and say that will never happen, then yes, two hours here or there doesn't matter that much until we start saturating the build machines

If we wanted to we could have flags that look at what folders have code changes and only run the expression tests with double, var, or fvar, types depending on that.

We could also have a flag (kind of like the current STAN_TEST_ROW_VECTORS) to turn this on or off, the question is then whether we'd like it to be on by default or to be something run when requested for a PR

@SteveBronder
Copy link
Collaborator Author

SteveBronder commented Apr 1, 2024

Yeah, I just wish if we were changing this time in increments of 2 hours it would be going the other way. I think a shorter feedback loop would be dramatically beneficial for contributors, even if it was still the kind of thing where you only got 2 tries in a work day (versus the current ~1 try).

I think if we want this as a goal what we could do is have a "quick mode" that is the default when a PR is opened that reads the diff, looks at the name of the files changed, then greps and only runs the tests with the same name in the test folder. And turn off all upstream testing. Then after an initial review we fall over to "test everything mode" which runs overnight.

Or alt we could have the tests and headers only include strictly what they need. Then we can kind of do the same thing as above but with a more broad scope of what tests need to be run. We used to have a setup like that but header errors were extremely annoying so we opted for the "include everything" scheme.

We could also have a flag (kind of like the current STAN_TEST_ROW_VECTORS) to turn this on or off, the question is then whether we'd like it to be on by default or to be something run when requested for a PR

Is there a way to tell github / jenkins "after this PR is approved run the tests one more time with this new flag and do not allow a merge until all tests pass"? That would be a pretty easy / automated way to turn this on and off

@WardBrian
Copy link
Member

a "quick mode" that is the default when a PR is opened that reads the diff, looks at the name of the files changed, then greps and only runs the tests with the same name in the test folder

That is already in place, sort of. It just continues on to run everything else after those initial tests. It does make some subset of possible errors fail-fast, which is nice.

Being much stricter about including what we use would let us run exactly the correct subset of tests, which would be nice. @syclik and I have talked about this a couple times, especially for the distribution tests which also take a huge chunk of time to run.

Is there a way to tell github / jenkins "after this PR is approved run the tests one more time with this new flag and do not allow a merge until all tests pass"?

I know a lot of projects use a merge bot for things like this. When a approving review is submitted, rather than click merge themselves they leave a comment like @stan-buildbot merge which can kick off any extra tests and automatically merge when they pass. For example in Rust, this is done by the @bors bot. You can see an example in action here: rust-lang/rust#123315

@syclik
Copy link
Member

syclik commented Apr 2, 2024

Just catching up. Given what I've read, I think it's sensible to do two things:

  1. Use this PR to improve testing. Yes, it slows it down a couple hours, but when we kick things over to the CI, we should be ok with some time.
  2. Separately, we should address the speed issues and really think through what needs to be tested. There are still lots of things we can do to test quickly, then test more thoroughly. Or restrict testing to things that have changed.

How do we get this moving forward? The loss of two hours of wait time to prevent upstream failures is worth it, I think.

@WardBrian
Copy link
Member

The loss of two hours of wait time to prevent upstream failures is worth it

I just wanted to make sure it was clear: these upstream failures are in PRs, not in anything user-facing. So it prevents some developer frustration and churn here in that situation. The same thing could be prevented if the user was told anywhere how to preemptively run the tests, which is what the discussion in #2736 was about

@andrjohns
Copy link
Collaborator

andrjohns commented Apr 4, 2024

Needing to compile all of mix the tests first, only to have a failure in one or two can lead to a bit of wasted computation.

The --jumbo flag already compiles in batches, would it be possible to change the behaviour to run a given batch as soon as it's compiled? That could help things fail faster, especially if we look into tuning the size of the batches

@WardBrian
Copy link
Member

WardBrian commented Apr 4, 2024

It would be possible, but I don't think that is an easy change in the current test infrastructure, since make doesn't tell you when some targets are done. I'd be happy to be proven wrong there.

Care would need to be taken to make sure the output was still readable from this

@andrjohns
Copy link
Collaborator

It would be possible, but I don't think that is an easy change in the current test infrastructure, since make doesn't tell you when some targets are done. I'd be happy to be proven wrong there.

Care would need to be taken to make sure the output was still readable from this

I was thinking of the steps:

  1. Generate batched test files (prob_1.cpp, prob_2.cpp, etc.), but don't compile
  2. Use python's (or bash's) threading to call runTests.py -j1 x on batches in parallel

That way the compilation is still parallelised, but each call is only building and immediately running one test. Whether or not this would be particularly 'clean', I'm not sure

@WardBrian
Copy link
Member

I think something of that flavor could work as long as anything shared between tests (like TBB or Sundials) was already built, otherwise you'd get filesystem races. Worth exploring things like that.

@stan-buildbot
Copy link
Contributor


Name Old Result New Result Ratio Performance change( 1 - new / old )
arma/arma.stan 0.17 0.17 1.04 4.3% faster
low_dim_corr_gauss/low_dim_corr_gauss.stan 0.01 0.01 1.05 4.76% faster
gp_regr/gen_gp_data.stan 0.02 0.02 1.06 5.77% faster
gp_regr/gp_regr.stan 0.1 0.1 1.02 2.33% faster
sir/sir.stan 75.82 74.29 1.02 2.02% faster
irt_2pl/irt_2pl.stan 3.59 3.71 0.97 -3.39% slower
eight_schools/eight_schools.stan 0.05 0.05 1.03 2.5% faster
pkpd/sim_one_comp_mm_elim_abs.stan 0.23 0.23 1.01 0.62% faster
pkpd/one_comp_mm_elim_abs.stan 17.06 16.73 1.02 1.94% faster
garch/garch.stan 0.43 0.43 1.0 0.21% faster
low_dim_gauss_mix/low_dim_gauss_mix.stan 2.67 2.65 1.01 0.64% faster
arK/arK.stan 1.54 1.55 0.99 -0.69% slower
gp_pois_regr/gp_pois_regr.stan 2.39 2.4 1.0 -0.25% slower
low_dim_gauss_mix_collapse/low_dim_gauss_mix_collapse.stan 8.69 8.62 1.01 0.82% faster
performance.compilation 176.23 177.15 0.99 -0.52% slower
Mean result: 1.0147850941500054

Jenkins Console Log
Blue Ocean
Commit hash: bc29abd03525f49136326f8e3ab87d96c3be2e56


Machine information No LSB modules are available. Distributor ID: Ubuntu Description: Ubuntu 20.04.3 LTS Release: 20.04 Codename: focal

CPU:
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
Address sizes: 46 bits physical, 48 bits virtual
CPU(s): 80
On-line CPU(s) list: 0-79
Thread(s) per core: 2
Core(s) per socket: 20
Socket(s): 2
NUMA node(s): 2
Vendor ID: GenuineIntel
CPU family: 6
Model: 85
Model name: Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz
Stepping: 4
CPU MHz: 2400.000
CPU max MHz: 3700.0000
CPU min MHz: 1000.0000
BogoMIPS: 4800.00
Virtualization: VT-x
L1d cache: 1.3 MiB
L1i cache: 1.3 MiB
L2 cache: 40 MiB
L3 cache: 55 MiB
NUMA node0 CPU(s): 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38,40,42,44,46,48,50,52,54,56,58,60,62,64,66,68,70,72,74,76,78
NUMA node1 CPU(s): 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39,41,43,45,47,49,51,53,55,57,59,61,63,65,67,69,71,73,75,77,79
Vulnerability Gather data sampling: Mitigation; Microcode
Vulnerability Itlb multihit: KVM: Mitigation: VMX disabled
Vulnerability L1tf: Mitigation; PTE Inversion; VMX conditional cache flushes, SMT vulnerable
Vulnerability Mds: Mitigation; Clear CPU buffers; SMT vulnerable
Vulnerability Meltdown: Mitigation; PTI
Vulnerability Mmio stale data: Mitigation; Clear CPU buffers; SMT vulnerable
Vulnerability Retbleed: Mitigation; IBRS
Vulnerability Spec rstack overflow: Not affected
Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2: Mitigation; IBRS, IBPB conditional, STIBP conditional, RSB filling, PBRSB-eIBRS Not affected
Vulnerability Srbds: Not affected
Vulnerability Tsx async abort: Mitigation; Clear CPU buffers; SMT vulnerable
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb cat_l3 cdp_l3 invpcid_single pti intel_ppin ssbd mba ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm cqm mpx rdt_a avx512f avx512dq rdseed adx smap clflushopt clwb intel_pt avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm ida arat pln pts hwp hwp_act_window hwp_epp hwp_pkg_req pku ospke md_clear flush_l1d arch_capabilities

G++:
g++ (Ubuntu 9.4.0-1ubuntu1~20.04) 9.4.0
Copyright (C) 2019 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

Clang:
clang version 10.0.0-4ubuntu1
Target: x86_64-pc-linux-gnu
Thread model: posix
InstalledDir: /usr/bin

@SteveBronder
Copy link
Collaborator Author

@WardBrian do you have any idea what is going on with the benchmark tests failing? I'm not seeing an error in the logs?

https://jenkins.flatironinstitute.org/blue/rest/organizations/jenkins/pipelines/Stan/pipelines/CmdStan%20Performance%20Tests/branches/downstream_tests/runs/953/log/?start=0

@SteveBronder
Copy link
Collaborator Author

It looks like the failure is coming from

+ curl -s -S 'https://jenkins.flatironinstitute.org/job/Stan/job/CmdStan%20Performance%20Tests/job/downstream_tests/953//logText/progressiveText?start=0'
curl: (7) Failed to connect to jenkins.flatironinstitute.org port 443: Connection timed out

Seems weird?

@stan-buildbot
Copy link
Contributor


Name Old Result New Result Ratio Performance change( 1 - new / old )
arma/arma.stan 0.33 0.43 0.77 -30.14% slower
low_dim_corr_gauss/low_dim_corr_gauss.stan 0.01 0.01 0.82 -21.51% slower
gp_regr/gen_gp_data.stan 0.02 0.03 0.85 -17.04% slower
gp_regr/gp_regr.stan 0.1 0.12 0.84 -19.44% slower
sir/sir.stan 72.63 74.07 0.98 -1.99% slower
irt_2pl/irt_2pl.stan 4.27 4.32 0.99 -1.05% slower
eight_schools/eight_schools.stan 0.06 0.06 1.07 6.43% faster
pkpd/sim_one_comp_mm_elim_abs.stan 0.26 0.25 1.05 4.78% faster
pkpd/one_comp_mm_elim_abs.stan 20.72 20.41 1.02 1.5% faster
garch/garch.stan 0.45 0.41 1.1 9.25% faster
low_dim_gauss_mix/low_dim_gauss_mix.stan 2.82 2.63 1.07 6.74% faster
arK/arK.stan 1.89 1.77 1.06 6.05% faster
gp_pois_regr/gp_pois_regr.stan 3.1 3.16 0.98 -1.85% slower
low_dim_gauss_mix_collapse/low_dim_gauss_mix_collapse.stan 9.16 8.9 1.03 2.86% faster
performance.compilation 190.82 199.75 0.96 -4.68% slower
Mean result: 0.9728302747107814

Jenkins Console Log
Blue Ocean
Commit hash: fe1ecd9ed58c8acfabb1e8d59239ec6b930b570f


Machine information No LSB modules are available. Distributor ID: Ubuntu Description: Ubuntu 20.04.3 LTS Release: 20.04 Codename: focal

CPU:
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
Address sizes: 46 bits physical, 48 bits virtual
CPU(s): 80
On-line CPU(s) list: 0-79
Thread(s) per core: 2
Core(s) per socket: 20
Socket(s): 2
NUMA node(s): 2
Vendor ID: GenuineIntel
CPU family: 6
Model: 85
Model name: Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz
Stepping: 4
CPU MHz: 3640.454
CPU max MHz: 3700.0000
CPU min MHz: 1000.0000
BogoMIPS: 4800.00
Virtualization: VT-x
L1d cache: 1.3 MiB
L1i cache: 1.3 MiB
L2 cache: 40 MiB
L3 cache: 55 MiB
NUMA node0 CPU(s): 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38,40,42,44,46,48,50,52,54,56,58,60,62,64,66,68,70,72,74,76,78
NUMA node1 CPU(s): 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39,41,43,45,47,49,51,53,55,57,59,61,63,65,67,69,71,73,75,77,79
Vulnerability Gather data sampling: Mitigation; Microcode
Vulnerability Itlb multihit: KVM: Mitigation: VMX disabled
Vulnerability L1tf: Mitigation; PTE Inversion; VMX conditional cache flushes, SMT vulnerable
Vulnerability Mds: Mitigation; Clear CPU buffers; SMT vulnerable
Vulnerability Meltdown: Mitigation; PTI
Vulnerability Mmio stale data: Mitigation; Clear CPU buffers; SMT vulnerable
Vulnerability Retbleed: Mitigation; IBRS
Vulnerability Spec rstack overflow: Not affected
Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2: Mitigation; IBRS, IBPB conditional, STIBP conditional, RSB filling, PBRSB-eIBRS Not affected
Vulnerability Srbds: Not affected
Vulnerability Tsx async abort: Mitigation; Clear CPU buffers; SMT vulnerable
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb cat_l3 cdp_l3 invpcid_single pti intel_ppin ssbd mba ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm cqm mpx rdt_a avx512f avx512dq rdseed adx smap clflushopt clwb intel_pt avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm ida arat pln pts hwp hwp_act_window hwp_epp hwp_pkg_req pku ospke md_clear flush_l1d arch_capabilities

G++:
g++ (Ubuntu 9.4.0-1ubuntu1~20.04) 9.4.0
Copyright (C) 2019 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

Clang:
clang version 10.0.0-4ubuntu1
Target: x86_64-pc-linux-gnu
Thread model: posix
InstalledDir: /usr/bin

@SteveBronder
Copy link
Collaborator Author

@WardBrian I think this is good to merge after the release is finished

@stan-buildbot
Copy link
Contributor


Name Old Result New Result Ratio Performance change( 1 - new / old )
arma/arma.stan 0.42 0.38 1.11 10.07% faster
low_dim_corr_gauss/low_dim_corr_gauss.stan 0.01 0.01 1.03 2.61% faster
gp_regr/gen_gp_data.stan 0.03 0.03 1.05 4.57% faster
gp_regr/gp_regr.stan 0.1 0.1 1.02 1.84% faster
sir/sir.stan 76.82 77.09 1.0 -0.34% slower
irt_2pl/irt_2pl.stan 5.14 4.83 1.06 6.04% faster
eight_schools/eight_schools.stan 0.06 0.06 0.96 -4.05% slower
pkpd/sim_one_comp_mm_elim_abs.stan 0.37 0.29 1.28 22.01% faster
pkpd/one_comp_mm_elim_abs.stan 20.89 21.91 0.95 -4.9% slower
garch/garch.stan 0.51 0.51 1.01 0.71% faster
low_dim_gauss_mix/low_dim_gauss_mix.stan 2.91 2.96 0.99 -1.47% slower
arK/arK.stan 2.06 2.06 1.0 -0.15% slower
gp_pois_regr/gp_pois_regr.stan 3.19 3.21 1.0 -0.47% slower
low_dim_gauss_mix_collapse/low_dim_gauss_mix_collapse.stan 9.77 9.89 0.99 -1.28% slower
performance.compilation 199.37 205.47 0.97 -3.06% slower
Mean result: 1.0271448709133995

Jenkins Console Log
Blue Ocean
Commit hash: d669f2b029575d5191b8f394b58b652e4bc4c805


Machine information No LSB modules are available. Distributor ID: Ubuntu Description: Ubuntu 20.04.3 LTS Release: 20.04 Codename: focal

CPU:
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
Address sizes: 46 bits physical, 48 bits virtual
CPU(s): 80
On-line CPU(s) list: 0-79
Thread(s) per core: 2
Core(s) per socket: 20
Socket(s): 2
NUMA node(s): 2
Vendor ID: GenuineIntel
CPU family: 6
Model: 85
Model name: Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz
Stepping: 4
CPU MHz: 2400.000
CPU max MHz: 3700.0000
CPU min MHz: 1000.0000
BogoMIPS: 4800.00
Virtualization: VT-x
L1d cache: 1.3 MiB
L1i cache: 1.3 MiB
L2 cache: 40 MiB
L3 cache: 55 MiB
NUMA node0 CPU(s): 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38,40,42,44,46,48,50,52,54,56,58,60,62,64,66,68,70,72,74,76,78
NUMA node1 CPU(s): 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39,41,43,45,47,49,51,53,55,57,59,61,63,65,67,69,71,73,75,77,79
Vulnerability Gather data sampling: Mitigation; Microcode
Vulnerability Itlb multihit: KVM: Mitigation: VMX disabled
Vulnerability L1tf: Mitigation; PTE Inversion; VMX conditional cache flushes, SMT vulnerable
Vulnerability Mds: Mitigation; Clear CPU buffers; SMT vulnerable
Vulnerability Meltdown: Mitigation; PTI
Vulnerability Mmio stale data: Mitigation; Clear CPU buffers; SMT vulnerable
Vulnerability Reg file data sampling: Not affected
Vulnerability Retbleed: Mitigation; IBRS
Vulnerability Spec rstack overflow: Not affected
Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2: Mitigation; IBRS; IBPB conditional; STIBP conditional; RSB filling; PBRSB-eIBRS Not affected; BHI Not affected
Vulnerability Srbds: Not affected
Vulnerability Tsx async abort: Mitigation; Clear CPU buffers; SMT vulnerable
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb cat_l3 cdp_l3 invpcid_single pti intel_ppin ssbd mba ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm cqm mpx rdt_a avx512f avx512dq rdseed adx smap clflushopt clwb intel_pt avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm ida arat pln pts hwp hwp_act_window hwp_epp hwp_pkg_req pku ospke md_clear flush_l1d arch_capabilities

G++:
g++ (Ubuntu 9.4.0-1ubuntu1~20.04) 9.4.0
Copyright (C) 2019 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

Clang:
clang version 10.0.0-4ubuntu1
Target: x86_64-pc-linux-gnu
Thread model: posix
InstalledDir: /usr/bin

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Document Stan Math expressions requirements
6 participants